For an annotated markdown of the analysis code, see: https://sbresnahan.github.io/allele-specific-transcription-and-m6A/reports/PSm6A.html
The “src” directory contains files that are required for this analysis:
metadata.csv contains the sample metadata for each
libraryAmel_HAv3.1_genes.bed was created using the RefSeq gene
annotation for Amel_HAv3.1 GCF_003254395.2_Amel_HAv3.1_genomic.gff
via BEDOPS
gff2bedAdditionally, the “results” directory contains files output by previous scripts that are required for this analysis:
coverage_matrix_2022-08-03.txt contains the read counts
at parent SNP positions and is output from
src/combine_results_2022-08-03.RprobM_matrix_2022-08-03.txt contains the RNA m6A
probability at parent SNPs and is output from
src/combine_results_2022-08-03.RlincRNA_txpts.txt contains the transcript IDs of
lincRNAs identified by src/lincRNAs_2022-07-01.RDElincRNAs.csv contains the gene IDs of differentially
expressed lincRNAs identified by
src/lincRNAs_2022-07-01.Rsig_switches_by_phenotype.txt contains the the gene IDs
showing significant isoform switches identified by
src/isoformSwitchAnalyzeR_2022-07-07.RThe “results” folder contains results from this analysis:
EHBxAHB_PS_expression.csv contains the read counts at
parent SNPs within transcriptsEHBxAHB_responsive_counts.csv contains the read counts
at parent SNPs within transcripts expressed in aggressive beesEHBxAHB_responsiveSK.csv contains test results of
Storer-Kim binomial exact tests at each parent SNP within transcripts
expressed in aggressive beesEHBxAHB_responsiveGLIMMIX.csv contains test results of
GLIMMIX models fit for each transcript expressed in aggressive beesEHBxAHB_unresponsive_counts.csv contains the read
counts at parent SNPs within transcripts expressed in non-aggressive
beesEHBxAHB_unresponsiveSK.csv contains test results of
Storer-Kim binomial exact tests at each parent SNP within transcripts
expressed in non-aggressive beesEHBxAHB_unsponsiveGLIMMIX.csv contains test results of
GLIMMIX models fit for each transcript expressed in non-aggressive
beesEHBxAHB_PS_m6A.csv contains the RNA m6A probability at
parent SNPs within transcriptsEHBxAHB_responsive_PSm6A.csv contains the RNA m6A
probability at parent SNPs within transcripts expressed in aggressive
beesEHBxAHB_responsiveZ.csv contains test results of
two-tailed unpooled Z-tests at each parent SNP within transcripts
expressed in aggressive beesEHBxAHB_unresponsive_PSm6A.csv contains the RNA m6A
probability at parent SNPs within transcripts expressed in
non-aggressive beesEHBxAHB_unresponsiveZ.csv contains test results of
two-tailed unpooled Z-tests at each parent SNP within transcripts
expressed in non-aggressive beesAdditionally, all scripts and code for the full analysis are available at https://github.com/sbresnahan/allele-specific-transcription-and-m6A.
This study seeks to use transcriptome data to understand behavioral differences between non-aggressive and aggressive honey bees. Two crosses were made. Cross A had AHB as the female parent (SRR14654188) and EHB as the male parent (SRR14654189). Cross B had EHB as the female parent (SRR14654186) and AHB as the male parent (SRR14654187). Three or more aggressive bees and three or more non-aggressive bees (labeled C and T) were collected from each cross. Twelve head and abdomen samples, spanning multiple batches over the course of a year, were ultimately analyzed, with three samples per experimental group. RNA samples were sequenced with Oxford Nanopore Technology direct RNA sequencing on a FLO-MIN106 flowcell.
In this report I describe the detection of and relationship between allele-specific transcription and RNA m6A, as well as their relationship to gene expression and differential isoform usage.
FAST5 files from all samples were processed with Guppy 5.0.16 using
the rna_r9.4.1_70bps_hac.cfg configuration (Oxford Nanopore
Technologies) for base calling on a GPU. FASTQ files labeled “pass” were
concatenated together to generate one FASTQ per sample. Minimap 2.211 was used for alignment
to the Apis
mellifera HAv3.1 reference genome (plus common viral genomes)
in splice-aware mode with a kmer length of 14, using the published gene
annotation to guide alignment. SAMtools 1.122 was used
to sort and index the resulting BAM files. Flair 1.53 was then
used to convert BAM files to BED format and correct intron-exon
junctions based on the published annotation, then call isoforms across
all samples. These isoforms included published transcripts and genes as
well as novel transcripts and genes.
Finally, EpiNano4 was used
to quantify read coverage and RNA m6A probability at parent SNPs within
transcripts. combined with a custom R script
(src/combine_results_2022-08-03.R) to subset the EpiNano
output to the A sites at the center of RRACH motifs. Read coverage and
m6A probability at each position within published and novel transcripts,
separated by allele, were then combined to generate read count and m6A
probability matrices.
Transcripts with \(n < 2\) positions are filtered from the datasets. Storer-Kim binomial exact tests and generalized linear mixed effects models with interaction terms (GLIMMIX) are used to identify genes that exhibited parent-of-origin biased allelic expression following protocols for identifying significant variation in parent-of-origin gene expression described previously. 5 6 7
We identified a total of 31,078 unique transcripts in our samples, including 3,640 lincRNAs. Of these, 3,081 (9.91%) were shared between both crosses and sufficiently varied in their sequence composition between the parents of each cross, allowing for identification of parent-of-origin reads in the offspring. Specifically, we identified 35,782 SNP positions within transcripts that had at least 2 SNPs. Of the 3,081 transcripts, 2,584 were from the published annotation, and 497 were novel. After filtering SNP positions with low read counts in the offspring, our dataset contained read counts at 18,674 positions distributed among 1,928 transcripts in non-aggressive bees and 33,494 positions distributed among 2,820 transcripts in aggressive bees.
We identified 307 transcripts that were consistently biased in both crosses, including 228 from previously annotated genes, 66 novel transcripts, and 13 lincRNAs. Some transcripts showed biased allelic expression in both non-aggressive and aggressive bees (\(n = 11\)), whereas others were only biased in one group (non-aggressive only, \(n = 54\); aggressive only, \(n = 242\)). In support of our hypothesis, we found that paternally expressed genes were enriched in aggressive bees. Interestingly, maternally expressed genes were also enriched in aggressive bees, although there were fewer maternally expressed genes.
A Two-tailed, unpooled z-test of two proportions was conducted for each transcript for each SNP position to test for differences between maternal and paternal m6A probability among the samples. Z-test results were then FDR corrected and aggregated by transcript. For each transcript, all positions were required to exhibit the same direction of parent- or lineage-of-origin bias as described above.
We identified 86 transcripts that showed consistent parent or lineage biases in RNA m6A in both crosses, including 71 from previously annotated genes, 14 novel transcripts, and 1 lincRNA. Some transcripts showed biased RNA m6A in both non-aggressive and aggressive workers (\(n = 9\)), whereas others were only biased in one group (non-aggressive only, \(n = 32\); aggressive only, \(n = 45\)). In contrast to transcription, there was no relationship between parent-specific RNA m6A and offspring aggression.
We find few genes that show both parent-of-origin allele-specific transcription and m6A, and few that show either that were differentially expressed and none that exhibited isoform switching.
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] doSNOW_1.0.20 snow_0.4-4 doParallel_1.0.17 iterators_1.0.14
## [5] foreach_1.5.2 kableExtra_1.3.4 gridExtra_2.3 viridis_0.6.2
## [9] viridisLite_0.4.1 car_3.1-1 carData_3.0-5 lmerTest_3.1-3
## [13] lme4_1.1-31 Matrix_1.5-1 tryCatchLog_1.3.1 Rfast_2.0.6
## [17] RcppZiggurat_0.1.6 Rcpp_1.0.9 plyr_1.8.8 forcats_0.5.2
## [21] stringr_1.5.0 dplyr_1.0.10 purrr_1.0.1 readr_2.1.3
## [25] tidyr_1.2.1 tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-160 fs_1.5.2 lubridate_1.9.0
## [4] webshot_0.5.4 httr_1.4.4 numDeriv_2016.8-1.1
## [7] tools_4.2.2 backports_1.4.1 bslib_0.4.2
## [10] utf8_1.2.2 R6_2.5.1 DBI_1.1.3
## [13] colorspace_2.0-3 withr_2.5.0 tidyselect_1.2.0
## [16] compiler_4.2.2 cli_3.6.0 rvest_1.0.3
## [19] xml2_1.3.3 sass_0.4.4 scales_1.2.1
## [22] systemfonts_1.0.4 digest_0.6.31 minqa_1.2.5
## [25] rmarkdown_2.19 svglite_2.1.1 pkgconfig_2.0.3
## [28] htmltools_0.5.4 highr_0.10 dbplyr_2.2.1
## [31] fastmap_1.1.0 rlang_1.0.6 readxl_1.4.1
## [34] rstudioapi_0.14 jquerylib_0.1.4 generics_0.1.3
## [37] jsonlite_1.8.4 googlesheets4_1.0.1 magrittr_2.0.3
## [40] munsell_0.5.0 fansi_1.0.3 abind_1.4-5
## [43] lifecycle_1.0.3 stringi_1.7.12 yaml_2.3.6
## [46] MASS_7.3-58.1 grid_4.2.2 crayon_1.5.2
## [49] lattice_0.20-45 haven_2.5.1 splines_4.2.2
## [52] hms_1.1.2 knitr_1.41 pillar_1.8.1
## [55] boot_1.3-28 codetools_0.2-18 reprex_2.0.2
## [58] glue_1.6.2 evaluate_0.19 modelr_0.1.10
## [61] vctrs_0.5.1 nloptr_2.0.3 tzdb_0.3.0
## [64] cellranger_1.1.0 gtable_0.3.1 assertthat_0.2.1
## [67] cachem_1.0.6 xfun_0.36 broom_1.0.2
## [70] googledrive_2.0.0 gargle_1.2.1 timechange_0.2.0
## [73] ellipsis_0.3.2
Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. doi:10.1093/bioinformatics/bty191↩︎
Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li (2021). Twelve years of SAMtools and BCFtools. GigaScience, Volume 10, Issue 2, giab008, https://doi.org/10.1093/gigascience/giab008↩︎
Tang, A.D., Soulette, C.M., van Baren, M.J. et al. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat Commun 11, 1438 (2020). https://doi.org/10.1038/s41467-020-15171-6↩︎
Liu, H., Begik, O., Lucas, M.C. et al. Accurate detection of m6A RNA modifications in native RNA sequences. Nat Commun 10, 4079 (2019). https://doi.org/10.1038/s41467-019-11713-9↩︎
Sarah D Kocher, Jennifer M Tsuruda, Joshua D Gibson, Christine M Emore, Miguel E Arechavaleta-Velasco, David C Queller, Joan E Strassmann, Christina M Grozinger, Michael R Gribskov, Phillip San Miguel, Rick Westerman, Greg J Hunt, A Search for Parent-of-Origin Effects on Honey Bee Gene Expression, G3 Genes|Genomes|Genetics, Volume 5, Issue 8, 1 August 2015, Pages 1657–1662, https://doi.org/10.1534/g3.115.017814↩︎
Galbraith, D. A., Kocher, S. D., Glenn, T., Albert, I., Hunt, G. J., Strassmann, J. E., … & Grozinger, C. M. (2016). Testing the kinship theory of intragenomic conflict in honey bees (Apis mellifera). Proceedings of the National Academy of Sciences, 113(4), 1020-1025. https://doi.org/10.1073/pnas.1516636113↩︎
Galbraith, D. A., Ma, R., & Grozinger, C. M. (2021). Tissue‐specific transcription patterns support the kinship theory of intragenomic conflict in honey bees (Apis mellifera). Molecular ecology, 30(4), 1029-1041. https://doi.org/10.1111/mec.15778↩︎